gh-57911: Sanitize symlink targets in tarfile on win32#138309
Conversation
fac592d to
5956b5d
Compare
5956b5d to
2ccc462
Compare
|
Should this be done for hardlinks as well? I'd prefer saying “rewrite” rather than “sanitize”, as this is not fixing unsafe input. We should probably skip this for leading |
This problem doesn't seam to occur when creating hardlinks: >>> os.mkdir("tmp")
>>> os.mkdir("tmp\\child")
>>> open("tmp\\child\\test.txt", "w").write("hello world")
11
>>> os.link("tmp/child/test.txt", "tmp/testlink.txt")
>>> open("tmp\\testlink.txt").read()
'hello world'
On the one hand, I see the risk of security implications, but I also note that in pathlib, slashes are also replaced in UNC paths. |
encukou
left a comment
There was a problem hiding this comment.
OK; re-reading the duscussion I see experts suggesting to always replace, so let's go with that.
|
🤖 New build scheduled with the buildbot fleet by @encukou for commit 942b6e3 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138309%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
| When extracting tar files on Windows Posix flavoured path separators in symlink | ||
| targets will be replaced by backward-slashes to prevent corrupted links. |
There was a problem hiding this comment.
| When extracting tar files on Windows Posix flavoured path separators in symlink | |
| targets will be replaced by backward-slashes to prevent corrupted links. | |
| When extracting tar files on Windows, slashes in symlink | |
| targets will be replaced by backslashes to prevent corrupted links. |
|
Looks good! Could you also add an entry to |
|
Thank you! |
|
I think this broke some buildbots: https://buildbot.python.org/#/builders/914. It wasn't detected until #138276 (which was 20h ago, but this change is 2 days ago and the previous buildbot run was 3 days ago) |
|
@picnixz if you open a revert PR as draft, you could use A |
|
I'm not on a dev session but I'll do it tomorrow if no one beats me to it. |
|
The test only fails, when running on a Windows system with disabled symlink support - therefore it didn't occure on GH actions. |
|
Shouldn't the inverse normalization of path separator characters be done in |
Which standard are you referring to?
Perhaps, but, I'm still fixing issues from the last time I changed long-standing behaviour in |
The tar format definition. I haven't checked the format definition (and I don't even know whether it exists), but all paths in a tar file are usually stored with
Think to this as more fallout from these changes. The normalization has always been missing, but it broke my application only since this PR has been merged (and Windows support for symlinks is more widespread: I would not have noticed it if not for the fact that GitHub Actions Windows runners have support for symlink enabled).
tar archives produced by code like os.symlink("../baz", "foo")
tar = tarfile.open("sample.tar", "w")
tar.add("foo")
tar.close()are currently invalid when created on Windows. In other words, they produce the wrong filesystem objects when extracted on a platform that uses Maybe code like this can be fixed specifying a I think that having the normalization only on extract is the worst possible outcome as it introduces unnecessary asymmetry: users of the |
Could you check, then? An actual reference to the standard would make this an easy sell -- that would mean that relevant people already thought about this. |
|
The tar standard is a POSIX standard. POSIX does not make backslashes special in any way; they are a valid character in a file name. Therefore, POSIX treats a link name of For example, https://man.freebsd.org/cgi/man.cgi?query=tar&apropos=0&sektion=5&manpath=FreeBSD+7.0-RELEASE&arch=default&format=html has and https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_271 defines
|
|
Found something in the POSIX standard as well (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06), where directory entry type 2 "represents a symbolic link. The contents of the symbolic link shall be stored in the linkname field." Within tar files, the contents of the symbolic link are best described in the linkpath extension record, which is defined as follows: "The pathname of a link being created to another file, of any type, previously archived. This record shall override the linkname field in the following ustar header block(s)." In other words, the target of a symbolic link in a tar file should indeed be a pathname, and as such separated by forward slashes. |
|
Thanks @bonzini ! |
CC @encukou